-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for narrowing Literals using equality #8151
Add support for narrowing Literals using equality #8151
Conversation
Also -- this PR was written on top of #8148. I can rebase this once the other PR lands. I also ran this on one of our internal codebases and confirmed there were no new errors, with the exception of one or two (legitimate) "unnecessary cast" errors. |
This pull request is a long-overdue update of the Literal type docs. It: 1. Removes the "this is alpha" warning we have at the top. 2. Mentions Literal enums are a thing (and works in a brief example of one). 3. Adds a section about "intelligent indexing". 4. Adds a section about the "tagged union" pattern (see python#8151). I made the example focus on the TypedDict/JSON use case -- IMO that's really the only realistically useful use case for the pattern. 5. Cross-references the "tagged union" docs with the TypedDicts docs -- IMO, tagged unions are mostly useful when you're working with JSON I also thought about making the "Unions of TypedDict" section I added to the TypedDicts doc mention using [pydantic][0] as an alternative to the "tagged union" pattern. I personally prefer using libraries like this which handle validation and let me use regular classes (and `isinstance`) instead of dicts, but I wasn't sure whether we want to be recommending 3rd party libraries so held off for now. [0]: https://pydantic-docs.helpmanual.io
This pull request is a long-overdue update of the Literal type docs. It: 1. Removes the "this is alpha" warning we have at the top. 2. Mentions Literal enums are a thing (and works in a brief example of one). 3. Adds a section about "intelligent indexing". 4. Adds a section about the "tagged union" pattern (see python#8151). I made the example focus on the TypedDict/JSON use case -- IMO that's really the only realistically useful use case for the pattern. 5. Cross-references the "tagged union" docs with the TypedDicts docs -- IMO, tagged unions are mostly useful when you're working with JSON I also thought about making the "Unions of TypedDict" section I added to the TypedDicts doc mention using [pydantic][0] as an alternative to the "tagged union" pattern. I personally prefer using libraries like this which handle validation and let me use regular classes (and `isinstance`) instead of dicts, but I wasn't sure whether we want to be recommending 3rd party libraries so held off for now. [0]: https://pydantic-docs.helpmanual.io
I'm glad to see this landing. My comment on your second type-narrowing sample, where checking the state limits the operations available, I would argue that the TypeScript doesn't support that use case, and I don't feel like you should have to, either. |
@fluggo -- ah, sorry. To clarify, the last example isn't attempting to narrow the type of Basically, the core problem is that just blindly applying the way mypy normally narrows variables ends up causing too many issues. For example, here's how mypy currently narrows the following program: class A: pass
class B: pass
def test(x: object) -> None:
assert isinstance(x, A)
reveal_type(x) # Revealed type is 'A'
assert isinstance(x, B)
reveal_type(x) # No output: stmt is unreachable; mypy thinks the assert is always false In this case, the behavior is mostly reasonable. While 'x' could theoretically be a type that multiply inherits from both A and B, it usually isn't, so concluding that the final assert will always be False and the last line to be unreachable is a reasonable heuristic. (And when/if mypy starts understanding the notion of intersection types, it could maybe infer x is an intersection of A and B, but whatever, that's a separate issue.) So, since Literal types are supposed to be treated more or less as regular subtypes of the underlying type, it might initially make sense to do the same thing there: def test(x: int) -> None:
assert x == 10
reveal_type(x) # Suppose we narrowed so the revealed type is Literal[10]
assert x == 20 # Then, mypy would conclude x can't be both 10 and 20
reveal_type(x) # ...and since the assert is always false, this is unreachable Attempting to narrow is reasonable in this specific case, since the last assert genuinely will always be false, since there's no way for the value of 'x' to abruptly change in after the first assert. But the expression we're narrowing can abruptly change if it's, say, some object attribute instead of just a variable. The 'worker' example I had was just trying to show a realistic example of where this happens: a method changes some internal field, and we check the new value is what we expected in some test case. This "what if a method changes an attribute and causes our narrowing to be bogus" problem also appears when narrowing regular types with That is, while our current heuristics for type-level narrowing are sometimes mildly problematic, attempting apply them for value-level narrowing ends up being actually problematic. So we need to adjust how we do value-level narrowing in some way (disable them for asserts? change the behavior of asserts so they override instead of narrowing? do the narrowing only if the attribute is And in practice, this ends up meaning that the thing you're narrowing needs to be a Union of Literals. (This conveniently happens to be just what you need if you want tagged unions w/o having to use enum everywhere.) That said, I'm not really sure whether this is good enough either. Sure, it'll ensure that the 'worker' example doesn't end up with any unreachable statements if ¯\_(ツ)_/¯ |
I see what you're saying now. I ran the same sample through TypeScript, and I was surprised to find that the same example fails there: class Worker {
state: 'ready' | 'running';
start() {
this.state = 'running';
}
query() {
}
}
function someTestCase(): void {
const worker = new Worker();
if(worker.state !== 'ready')
throw new Error('');
worker.start();
if(worker.state !== 'running') // Error: This condition will always return 'true' since the types '"ready"' and '"running"' have no overlap. ts(2367)
throw new Error('');
worker.query();
} It seems the solution in TypeScript is exactly as you proposed: don't define a "status"-like field as type Literal. TypeScript itself follows this principle for XmlHttpRequest, which defines the |
I will review this right after #8148 is merged (currently it is a bit too large to review, sorry) |
This pull request (finally) adds support for narrowing expressions using Literal types by equality, instead of just identity. For example, the following "tagged union" pattern is now supported: ```python class Foo(TypedDict): key: Literal["A"] blah: int class Bar(TypedDict): key: Literal["B"] something: str x: Union[Foo, Bar] if x.key == "A": reveal_type(x) # Revealed type is 'Foo' else: reveal_type(x) # Revealed type is 'Bar' ``` Previously, this was possible to do only with Enum Literals and the `is` operator, which is perhaps not very intuitive. The main limitation with this pull request is that it'll perform narrowing only if either the LHS or RHS contains an explicit Literal type somewhere. If this limitation is not present, we end up breaking a decent amount of real-world code -- mostly tests -- that do something like this: ```python def some_test_case() -> None: worker = Worker() # Without the limitation, we narrow 'worker.state' to # Literal['ready'] in this assert... assert worker.state == 'ready' worker.start() # ...which subsequently causes this second assert to narrow # worker.state to <uninhabited>, causing the last line to be # unreachable. assert worker.state == 'running' worker.query() ``` I tried for several weeks to find a more intelligent way around this problem, but everything I tried ended up being either insufficient or super-hacky, so I gave up and went for this brute-force solution. The other main limitation is that we perform narrowing only if both the LHS and RHS do not define custom `__eq__` or `__ne__` methods, but this seems like a more reasonable one to me. Resolves python#7944.
77622f9
to
f82a019
Compare
@ilevkivskyi -- fyi, I finished rebasing this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The general idea looks right to me, I only have a bunch of minor comments.
mypy/checker.py
Outdated
|
||
|
||
# TODO: why can't we define this as an inline function? | ||
# Does mypyc not support them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean lambda by "inline function"? What exactly goes wrong when you try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I actually don't remember. I'll try experimenting with this a little later today and see if I can repro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, I don't seem to be able to repro this anymore? Maybe it was fixed after I rebased or maybe I just doing something wrong before, but defining an inline function seems to be working fine now.
Defining a lambda unfortunately causes flake8 to complain -- it doesn't like it when you assign a lambda to a variable.
Anyways, I moved this function back to where it's being used.
@@ -655,7 +674,7 @@ def coerce_to_literal(typ: Type) -> ProperType: | |||
enum_values = get_enum_values(typ) | |||
if len(enum_values) == 1: | |||
return LiteralType(value=enum_values[0], fallback=typ) | |||
return typ | |||
return original_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is right, thanks!
reveal_type(x5) # N: Revealed type is 'TypedDict('__main__.TypedDict1', {'key': Literal['A'], 'foo': builtins.int})' | ||
else: | ||
reveal_type(x5) # N: Revealed type is 'TypedDict('__main__.TypedDict2', {'key': Literal['B'], 'foo': builtins.str})' | ||
[builtins fixtures/primitives.pyi] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add couple more tests with typed dicts (for example nested typed dicts), since this is pretty important use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention this earlier, but I added a few more tests, mostly copying some of the existing ones I had for enums and objects. LMK if there are more test cases you'd like to see.
# is narrowed here -- see 'testNarrowingEqualityFlipFlop' for an example of | ||
# why more precise inference here is problematic. | ||
x_str: str | ||
if x_str == A_final: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand why literal type is "better" than a final name. IMO final names should behave exactly as if they were replaced by their values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's any consolation, we also won't narrow anything away if we do if x_str == "A"
-- I updated the test case with an example.
Basically, I found trying to narrow when we do x_str == "A"
often ended up causing too many false positives, which is why I settled on the heuristic of requiring either the LHS or the RHS to be a Literal[...]
of some sort before we attempt narrowing.
test-data/unit/check-narrowing.test
Outdated
reveal_type(z) | ||
else: | ||
# TODO: Why do we narrow away 'Literal[1]' here? | ||
# Even if the equality comparison is bogus, we should try and do better here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit worrying, so I think this deserves a follow-up issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is a bit embarrassing. I did some more digging and it turned out that the whole concept of "partial TypeMaps" I introduced in my earlier diff was incorrect and that several of the narrowing test cases I added were too optimistic.
Basically, I made mypy narrow the else case in programs like:
x: Literal[1, 2]
y: Literal[1, 2]
if x == 2 == y:
reveal_type(x) # Revealed type is Literal[2]
reveal_type(y) # Revealed type is Literal[2]
else:
reveal_type(x) # Revealed type is incorrectly Literal[1]!
reveal_type(y) # Revealed type is incorrectly Literal[1]!
But actually, we could enter the else case if x == 2 and y == 1 or vice versa, so narrowing was incorrect.
Going back to using just or_conditional_maps
instead of or_partial_conditional_maps
ended up fixing these bogus narrowings. I also added a few more test cases + fixed some earlier enum ones.
This should be ready for a second look whenever fyi. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updates! Looks good now, just one comment.
test-data/unit/check-enum.test
Outdated
reveal_type(x) # N: Revealed type is 'Literal[__main__.Foo.B]' | ||
reveal_type(y) # N: Revealed type is 'Literal[__main__.Foo.B]' | ||
reveal_type(x) # N: Revealed type is '__main__.Foo' | ||
reveal_type(y) # N: Revealed type is '__main__.Foo' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spacing before comments is kind of inconsistent here and below.
This pull request is a long-overdue update of the Literal type docs. It: 1. Removes the "this is alpha" warning we have at the top. 2. Mentions Literal enums are a thing (and works in a very brief example of one). 3. Adds a section about "intelligent indexing". 4. Adds a section with an example about the "tagged union" pattern (see #8151). 5. Cross-references the "tagged union" docs with the TypedDicts docs.
In theory, this draft commit fixes python#8236 However, it looks like the ComparisonExpr part of find_isinstance_check_helper was rewritten today morning in python#8151. It looks like it's not super easy to merge: `is_valid_target` gets in the way and I'm a lot less sure about the change. I'm also not fully sure about the implications of making AssignmentExpr into mypy.literals, but seems like it's what we'd want. I also don't really like the way I've branched for AssignmentExpr, it's pretty unsatisfactory. Well, work in progress!
In theory, this draft commit fixes python#8236 However, it looks like the ComparisonExpr part of find_isinstance_check_helper was rewritten today morning in python#8151. It looks like it's not super easy to merge: `is_valid_target` gets in the way and I'm a lot less sure about the change. I'm also not fully sure about the implications of making AssignmentExpr into mypy.literals, but seems like it's what we'd want. I also don't really like the way I've branched for AssignmentExpr, it's pretty unsatisfactory. Well, work in progress!
This pull request (finally) adds support for narrowing expressions using Literal types by equality, instead of just identity. For example, the following "tagged union" pattern is now supported:
Previously, this was possible to do only with Enum Literals and the
is
operator, which is perhaps not very intuitive.The main limitation with this pull request is that it'll perform narrowing only if either the LHS or RHS contains an explicit Literal type somewhere. If this limitation is not present, we end up breaking a decent amount of real-world code -- mostly tests -- that do something like this:
I tried for several weeks to find a more intelligent way around this problem, but everything I tried ended up being either insufficient or super-hacky, so I gave up and went for this brute-force solution.
The other main limitation is that we perform narrowing only if both the LHS and RHS do not define custom
__eq__
or__ne__
methods, but this seems like a more reasonable one to me.Resolves #7944.